Temporal Models and Smoothing
Data are often observed in time, and time dependence is often expected.
Note: We can use the same model to smooth covariate effects!
Smoothing of the time effect
Prediction
We can “predict” any unobserved data, not only future data, e.g. gaps in the data etc.
Time can be indexed over a
Discrete domain (e.g., years)
Continuous domain
Time can be indexed over a
Discrete domain (e.g., years)
Main models: RW1, RW2 and AR1
Note: RW1 and RW2 are also used for smoothing covariates
Continuous domain
Goal we want understand the pattern and predict into the future
Random walk models encourage the mean of the linear predictor to vary gradually over time.
They do this by assuming that, on average, the time effect at each point is the mean of the effect at the neighbouring points.
Random Walk of order 1 (RW1) we take the two nearest neighbours
Random Walk of order 2 (RW2) we take the four nearest neighbours
Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)
Definition
\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]
\[ \mathbf{Q} = \tau \begin{bmatrix} 1 & -1 & & & & \\ -1 & 2 & -1 & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -1 & 2 & -1 \\ & & & & -1 & 1 \end{bmatrix} \]
Idea: \(\longrightarrow\ u_t = \text{mean}(u_{t-1} , u_{t+1}) + \text{Gaussian error with precision } \tau\)
Definition
\[ \pi(\mathbf{u} \mid \tau) \propto \exp\!\left( -\frac{\tau}{2} \sum_{t=1}^{T-1} (u_{t+1} - u_t)^2 \right) = \exp\!\left(-\tfrac{1}{2} \, \mathbf{u}^{\top} \mathbf{Q}\ \mathbf{u}\right) \]
\(\tau\) says how much \(u_t\) can vary around its mean
We need to set a prior distribution for \(\tau\).
A common option is the so called PC-priors
inlabru for many model parametersThey are built with two principles in mind:
A line is the base model
We want to penalize more complex models
PC prior are easily available in inlabru for many model parameters
They are built with two principle in mind:
\[ \begin{eqnarray} \sigma = \sqrt{1/\tau} \\ \text{Prob}(\sigma>U) = \alpha;\\ \qquad U>0, \ \alpha \in (0,1) \end{eqnarray} \]
\(U\) an upper limit for the standard deviation and \(\alpha\) a small probability.
\(U\) a likely value for the standard deviation and \(\alpha=0.5\).
The Model
\[ \begin{aligned} y_i|\eta_i, \sigma^2 & \sim \mathcal{N}(\eta_i,\sigma^2)\\ \eta_i & = \beta_0 + f(t_i)\\ f(t_1),f(t_2),\dots,f(t_n) &\sim \text{RW2}(\tau) \end{aligned} \]
RW1 defines differences, not absolute levels:
Only the changes between neighbouring terms are modelled.
Mathematically, \[ (u_1,\dots,u_n)\text{ and }(u_1+a,\dots,u_n+a) \] produce identical likelihoods — they’re indistinguishable.
This means:
so parameters are not well-defined.
Solution:
[1] "FIT1 - Intercept"
mean sd 0.025quant 0.5quant 0.975quant mode kld
Intercept 174.138 0.002 174.135 174.138 174.141 174.138 0
[1] "FIT2 - Intercept"
mean sd 0.025quant 0.5quant 0.975quant mode kld
Intercept 0 31.623 -62.009 0 62.009 0 0
[1] "FIT1 - RW1 effect"
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
1 1918 -0.122 0.015 -0.151 -0.122 -0.090 -0.122 0
2 1919 0.039 0.015 0.005 0.040 0.067 0.042 0
[1] "FIT2 - RW1 effect"
ID mean sd 0.025quant 0.5quant 0.975quant mode kld
1 1918 174.016 31.623 112.008 174.016 236.025 174.016 0
2 1919 174.177 31.623 112.168 174.177 236.186 174.177 0
\[ u_t = \text{mean}(u_{t-2} ,u_{t-1} , u_{t+1}, u_{t+2} ) + \text{some Gaussian error with precision } \tau \]
RW2 are smoother than RW1
The precision has the same role as for RW1
cmp1 = ~ Intercept(1) +
time(year, model = "rw1",
scale.model = T,
hyper = list(prec =
list(prior = "pc.prec",
param = c(0.3,0.5))))
cmp2 = ~ Intercept(1) +
time(year, model = "rw2",
scale.model = T,
hyper = list(prec =
list(prior = "pc.prec",
param = c(0.3,0.5))))
lik = bru_obs(formula = Erie~ .,
data = lakes)
fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)NOTE: the scale.model = TRUE option scales the \(\mathbf{Q}\) matrix so the precision parameter has the same interpretation in both models.
RW models are discrete models
Covariates are often recorded as continuous values
The function inla.group() will bin covariate values into groups (default 25 groups)
inla.group(x, n = 25, method = c("cut", "quantile"))
Two ways to bin
cut (default) splits the data using equal length intervalsquantile uses equi-distant quantiles in the probability space.The data are derived from an anthropometric study of 892 females under 50 years in three Gambian villages in West Africa.
Age - Age of respondent (continuous)
triceps - Triceps skinfold thickness.
Latent effects suitable for smoothing and modelling temporal data.
One hyperparameter: the precision \(\tau\)
It is an intrinsic model
The precision matrix \(\mathbf{Q}\) is rank deficient
A sum-to-zero constraint is added to make the model identifiable!
RW2 models are smoother than RW1
Definition
\[ u_t = \phi u_{t-i} + \epsilon_t; \qquad \phi\in(-1,1), \ \epsilon_t\sim\mathcal{N}(0,\tau^{-1}) \]
\[ \pi(\mathbf{u}|\tau)\propto\exp\left(-\frac{\tau}{2}\mathbf{u}^T\mathbf{Q}\mathbf{u}\right) \]
with
\[ \mathbf{Q} = \begin{bmatrix} 1 & -\phi & & & & \\ -\phi & (1+\phi^2) & -\phi & & & \\ & & \ddots & \ddots & \ddots & \\ & & & -\phi & (1+\phi^2) & -\phi \\ & & & & -\phi & 1 \end{bmatrix} \]
The AR1 model has two parameters
The AR1 model has two parameters
The precision \(\tau\)
pc.prec \[
\text{Prob}(\sigma > u) = \alpha
\]The autocorrelation (or persistence) parameter $(-1,1)
pc.cor0\[ \begin{eqnarray} \text{Prob}(|\rho| > u) = \alpha;\\ -1<u<1;\ 0<\alpha<1 \end{eqnarray} \]
The AR1 model has two parameters
The precision \(\tau\)
pc.prec \[
\text{Prob}(\sigma > u) = \alpha
\]The autocorrelation (or persistence) parameter $(-1,1)
pc.cor1\[ \begin{eqnarray} \text{Prob}(\rho > u) = \alpha;&\\ -1<u<1;\qquad &\sqrt{\frac{1-u}{2}}<\alpha<1 \end{eqnarray} \]
The Model
\[ \begin{aligned} y_t|\eta_t & \sim \text{Poisson}(\exp(\eta_t))\\ \eta_t & = \beta_0 + u_t\\ 1.\ u_t&\sim \text{RW}1(\tau)\\ 2.\ u_t&\sim \text{AR}1(\tau, \phi)\\ \end{aligned} \]
Number of serious earthquakes per year
hyper = list(prec = list(prior = "pc.prec", param = c(1,0.5)))
cmp1 = ~ Intercept(1) + time(year, model = "rw1", scale.model = T,
hyper = hyper)
cmp2 = ~ Intercept(1) + time(year, model = "ar1",
hyper = hyper)
lik = bru_obs(formula = quakes ~ .,
family = "poisson",
data = df)
fit1 = bru(cmp1, lik)
fit2 = bru(cmp2, lik)Estimated trend
Predictions
RW1 models can be seen as a limit for AR1 models
They are not too different when smoothing, but can
Sometimes the data are not collected at discrete time points but over continous time
One solution (not a bad one usually) is to discretize the time and use RW model (AR models are then harder to justify..)
_ A different solution is to use a continuous smoother based on a continous Gaussian Random Field (GRF) \(\omega(t)\)
What is this?
\[ (\omega(t_1),\omega(t_2),\dots,\omega(t_n))\sim\mathcal{N}(\mathbf{0},\mathbf{\Sigma}_{\omega(t_1),\omega(t_2),\dots,\omega(t_n)} ), \qquad \text{ for any } t_1,t_2,\dots,t_n\in\mathcal{T} \] Where \(\mathbf{\Sigma}_{\omega(t_1),\omega(t_2),\dots,\omega(t_n)}\) is a variance-covariance matrix.
To define \(\omega(t), t\in\mathcal{T}\), we have to define \(\Sigma\)
BUT…
Covariance matrices are not nice objects!
We define a (Matern) GRF as the solution of a stochastic partial differential equation (SPDE)
\[ (\kappa^2-\Delta)^{\alpha/2}\omega(t) = W(t) \]
What is this?
Think of white noise as someone randomly tapping along a rope.
But the rope has tension and stifness
The parameters:
Tension = controls how far randomness propagates (\(\kappa\))
Stiffness = controls how smooth the field becomes (\(\alpha\))
Ok…but we still need to solve the SPDE to find \(\omega(t)\)!
Now we need to discretize the domain into T points (we cannot compute on the continuous!)
We represent our solution as
\[ \omega(t) = \sum_{i = 1}^T\phi_i(t)w_i \] Where